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ABSTRACT 

In  this  paper,  we  propose  an  approach  to  manage  network  resources  for  a  Direct  Sequence  Code  Division  Multiple 
Access  (DS-CDMA)  visual  sensor  network  where  nodes  monitor  scenes  with  varying  levels  of  motion.  It  uses  cross-layer 
optimization  across  the  physical  layer,  the  link  layer  and  the  application  layer.  Our  technique  simultaneously  assigns  a 
source  coding  rate,  a  channel  coding  rate,  and  a  power  level  to  all  nodes  in  the  network  based  on  one  of  two  criteria  that 
maximize  the  quality  of  video  of  the  entire  network  as  a  whole,  subject  to  a  constraint  on  the  total  chip  rate.  One  criterion 
results  in  the  minimal  average  end-to-end  distortion  amongst  all  nodes,  while  the  other  criterion  minimizes  the  maximum 
distortion  of  the  network.  Our  approach  allows  one  to  determine  the  capacity  of  the  visual  sensor  network  based  on  the 
number  of  nodes  and  the  quality  of  video  that  must  be  transmitted.  For  bandwidth-limited  applications,  one  can  also  deter¬ 
mine  the  minimum  bandwidth  needed  to  accommodate  a  number  of  nodes  with  a  specific  target  chip  rate.  Video  captured 
by  a  sensor  node  camera  is  encoded  and  decoded  using  the  H.264  video  codec  by  a  centralized  control  unit  at  the  network 
layer.  To  reduce  the  computational  complexity  of  the  solution.  Universal  Rate-Distortion  Characteristics  (URDCs)  are 
obtained  experimentally  to  relate  bit  error  probabilities  to  the  distortion  of  corrupted  video.  Bit  error  rates  are  found  first 
by  using  Viterbi’s  upper  bounds  on  the  bit  error  probability  and  second,  by  simulating  nodes  transmitting  data  spread  by 
Total  Square  Correlation  (TSC)  codes  over  a  Rayleigh-faded  DS-CDMA  channel  and  receiving  that  data  using  Auxiliary 
Vector  (AV)  filtering. 


1.  INTRODUCTION 

Current  wireless  networking  solutions  do  not  always  provide  adequate  support  for  multimedia  applications  because  most 
are  designed  around  the  layered  protocol  architecture  that  forms  the  foundation  of  networking  design.  With  the  rapid 
growth  of  wireless  networks,  the  question  arises  as  to  whether  this  architecture  is  still  optimal  for  wireless  networks,  as 
well.1  Wireless  networks  are  essentially  different  from  their  wired  counterparts  in  a  number  of  ways.  The  main  difference 
between  wireless  channels  and  wired  channels  is  the  time-varying  nature  of  wireless  channels  that  leads  to  errors  due  to 
multipath  fading  and  cochannel  interference. 

In  this  paper,  we  consider  a  Direct  Sequence-Code  Division  Multiple  Access  (DS-CDMA)  visual  sensor  network  which 
we  assume  is  comprised  of  typically  low-weight  distributed  sensor  nodes  equipped  with  a  video  camera  to  survey  a  large 
area.  The  centralized  control  unit  at  the  network  layer  performs  channel  and  source  decoding  to  obtain  the  received  video 
from  each  node.  The  control  unit  transmits  information  to  the  nodes  in  order  to  request  changes  in  transmission  parameters, 
such  as  source  coding  rate,  channel  coding  rate,  and  transmission  power.  Applications  of  visual  sensor  networks  include 
surveillance,  automatic  tracking  and  signaling  of  intruders  within  a  physical  area,  command  and  control  of  unmanned 
vehicles,  and  environmental  monitoring.  Some  visual  sensor  networks  focus  on  image  transmission  and  consider  the  trade¬ 
off  between  image  quality  and  energy  consumption  of  different  routing  paths.2  Visual  sensor  networks  that  transmit  video 
are  much  more  challenging  than  typical  sensor  networks  due  to  the  high  bit  rates  and  delay  constraints.  One  method  is  to 
propose  a  new  wireless  sensor  node  protocol  stack  that  addresses  the  demanding  nature  of  visual  sensor  networks.3  But, 
they  do  not  consider  the  improvements  that  can  be  gained  with  cross-layer  interactions. 

In  our  set-up,  some  nodes  will  be  imaging  a  relatively  stationary  field  while  other  nodes  will  be  imaging  scenes  with  a 
high  level  of  motion  to  create  a  more  realistic  scenario  where  scenes  with  varying  levels  of  motion  exist.  Video  sequences 
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with  less  motion  can  be  source  encoded  at  a  lower  bit  rate  while  still  yielding  good  picture  quality.  The  centralized  control 
unit  at  the  network  layer  should  be  able  to  request  that  the  video  at  specific  nodes  be  transmitted  at  a  lower  bit  rate,  if  it  is 
deemed  as  being  capable  of  still  producing  adequate  video  quality.  Nodes  that  compress  their  video  at  a  lower  bit  rate  can 
afford  a  higher  channel  coding  rate  and  lower  transmission  power  level  so  that  they  will  cause  less  interference  to  the  other 
nodes.  For  this  reason,  DS-CDMA  is  an  appropriate  choice  for  use  in  our  visual  sensor  network  set-up. 

For  video  transmission  over  wireless  channels,  it  is  imperative  to  maintain  a  low  bit  error  rate  in  order  to  guarantee  an 
adequate  level  of  viewing  quality.  The  increased  number  of  errors  caused  by  transmitting  over  wireless  channels  may  lead 
to  a  devastating  degradation  on  the  quality  of  the  received  video.  The  wireless  channel’s  performance  can  be  exploited 
by  dynamically  adjusting  the  transmission  parameters  through  cross-layer  interactions.  Although  Shannon’s  principle  of 
separability  states  that  it  is  possible  to  design  source  and  channel  coding  separately  without  loss  of  optimality,  the  principle 
assumes  that  the  source  and  channel  codes  are  of  arbitrarily  long  lengths.  Since  this  assumption  does  not  hold  in  practical 
situations  due  to  limitations  on  computational  power  and  processing  delays,  it  is  useful  to  consider  source  and  channel 
coding  jointly.4  Hence,  it  is  advantageous  for  the  lower  layers  of  the  stack  to  consider  the  end  application  for  resource 
management  and  protection  strategies,  and  conversely,  the  video  compression  algorithms  used  at  the  application  layer 
should  consider  the  error  protection,  scheduling,  etc.  used  at  lower  layers  to  maximize  the  end-to-end  video  quality.5 
Cross-layer  design  breaks  the  mold  of  the  traditional  layered  architecture  by  allowing  non-adjacent  layers  to  interact  with 
each  other. 

Various  cross-layer  schemes  that  have  been  proposed  over  the  years.  There  is  a  joint  source-channel  coding  method 
that  is  developed  for  motion-compensated  Discrete  Cosine  Transform  (DCT)-based  scalable  video  coding  and  transmis¬ 
sion.6  Another  method  jointly  considers  source  coding,  channel  resource  allocation,  and  error  concealment  to  provide  a 
framework  that  balances  resource  consumption  with  end-to-end  video  quality  in  packet-based  video  transmission.7  But, 
these  methods  do  not  consider  the  physical  layer.  The  physical  layer  is  especially  important  in  wireless  networks  because 
it  ultimately  affects  the  performance  of  all  other  layers.  It  is  the  one  layer  where  data  is  physically  moved  across  the 
communications  network. 

Power  control  can  be  used  as  an  indirect  way  of  controlling  a  user’s  associated  error  probability  in  order  to  achieve  a 
certain  Quality-of-Service  (QoS).8  However,  using  power  control  alone  is  not  enough  to  ensure  adequate  quality  if  video 
is  transmitted.  A  joint  source  coding-power  control  approach  exists  that  allocates  a  source  coding  rate  and  the  energy- 
per-bit  to  the  multiple-access  interference  density  to  each  user  to  maximize  the  per-cell  capacity  and  the  end-to-end  QoS 
for  individual  users.9  There  is  a  joint  source  coding  and  power  management  scheme  for  wireless  video  multicast  where 
feedback  is  assumed,  and  Automatic  Repeat  Request  methods  are  used  instead  of  Forward  Error  Correction.10  Feedback  is 
not  feasible  in  all  communication  scenarios.  These  techniques  do  not  address  the  need  for  an  optimal  channel  coding  rate 
for  each  user  when  feedback  is  not  possible.  However,  for  transmission  over  error-prone  channels,  it  is  imperative  that  an 
acceptable  choice  for  the  channel  coding  rate  is  made  to  offer  an  adequate  level  of  protection  for  the  data  transmitted. 

There  are  some  cross-layer  designs  that  consider  from  the  physical  layer  up  through  the  application  layer.  Some  jointly 
consider  the  application,  data  link,  and  physical  layers  in  video  communications,  but  do  not  consider  power  management.1 1 
12  13  However,  a  user’s  transmission  power  determines  the  battery  life  in  mobile  devices,  the  interference  experienced  by 
other  users,  and  the  network  capacity.  Another  technique  proposes  a  gradient-based,  distortion-aware  scheduling  scheme 
for  packet-based  video  transmission.14  There  is  a  cross-layer  scheme  that  allocates  the  power  level,  source  coding  rate, 
and  channel  coding  rate  while  maintaining  a  basic  QoS  to  less  capable  receivers  and  an  enhanced  QoS  for  more  capable 
receivers.  They  use  Orthogonal  Frequency-Division  Multiplexing  (OFDM)  which  is  very  sensitive  to  synchronization 
errors  both  in  timing  and  frequency.15 

In  this  work,  our  multi-node  cross-layer  optimization  technique  accounts  for  network  performances  all  the  way  from  the 
physical  layer  up  to  the  application  layer.  At  the  application  layer,  the  source  coding  rate  for  H.264/AVC  video  compression 
is  determined.  At  the  data  link  layer,  the  Rate-Compatible  Puctured  Convolutional  (RCPC)  channel  coding  rate  is  selected. 
And  at  the  physical  layer,  the  transmission  power  is  determined.  Our  algorithm  simultaneously  allocates  a  source  coding 
rate,  a  channel  coding  rate,  and  a  power  level  to  all  nodes  in  a  DS-CDMA  visual  sensor  network.  We  recognize  that 
different  nodes  of  a  sensor  network  can  have  different  source  coding  rate  requirements  due  to  different  scene  activity  and 
propose  to  jointly  optimize  all  nodes  using  one  of  two  criteria.  Our  first  criterion  results  in  the  Minimal  Average  end-to-end 
Distortion  (MAD)  over  all  the  nodes  in  the  network  while  our  second  criterion  Minimizes  the  Maximum  Distortion  (MMD) 
amongst  all  nodes.  Since  it  would  be  prohibitively  complex  to  experimentally  obtain  the  expected  distortion  for  each  node 
for  all  possible  combinations  of  source  coding  rates,  channel  coding  rates,  and  power  levels,  we  instead  have  chosen  to 
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relax  the  optimality  of  the  algorithm  and  utilize  Universal  Rate-Distortion  Characteristics  (URDCs).  These  characteristics 
show  the  expected  distortion  as  a  function  of  the  bit  error  probabilities,  /),,  after  channel  coding. 


2.  RESOURCE  ALLOCATION  USING  CROSS-LAYER  OPTIMIZATION 

This  work  considers  a  wireless  visual  sensor  network  that  utilizes  DS-CDMA.  After  source  and  channel  coding,  the  data  is 
spread  using  a  spreading  code  and  is  then  carrier-modulated.  But  even  if  the  spreading  codes  used  are  orthogonal  to  each 
other,  transmissions  from  one  node  cause  interference  to  the  other  nodes  due  to  possible  asynchronous  transmissions  and 
multipath  fading.  Since  all  nodes  transmit  on  the  same  frequency,  interference  within  a  channel  plays  a  significant  role  in 
determining  the  system’s  capacity  and  its  quality-of-service  (QoS).  The  transmit  power  for  each  node  must  be  minimized 
to  limit  the  interference  experienced  by  other  nodes  in  the  system.  This  is  critical  for  sensor  networks,  since  the  nodes  are 
typically  battery-operated  and  have  limited  energy.  But,  at  the  same  time,  a  node’s  power  must  be  high  enough  to  maintain 
its  own  quality.  Nodes  that  compress  their  video  at  a  lower  bit  rate  are  left  with  more  bits  for  channel  coding  and  can  afford 
to  transmit  at  a  lower  power,  thereby  causing  less  interference  to  the  other  nodes.16 

Assuming  there  are  K  nodes  in  a  synchronous  single-path  Binary  Phase  Shift  Keying  (BPSK)  channel,  the  received 
signal  can  be  expressed  as 

K 

r  (i)  =  A±bi  (i)  Si  +  Akbk  (i)  sk  +  nk  (1) 

k= 2 

where  Ak,  bk  (i)  .  sk,  nfc,  are  the  amplitude,  symbol  stream,  signature  sequence,  and  noise  of  node  k,  respectively.  Ignoring 
thermal  noise  and  background  noise  due  to  spurious  interference  allows  us  to  assume  that  No,  the  noise  spectral  density,  is 
entirely  due  to  interference  from  other  nodes  in  the  systems 

K  K 

F  Akbk  (i)  sfc  +  nfc  ss  F  Akbk  (i)  sk  (2) 

k=2  k-2 


It  is  a  reasonable  assumption  that  the  probability  distribution  of  the  interfering  nodes  is  a  zero-mean  Gaussian  random 
variable.17  Since  user  i  has  an  associated  power  level  in  Watts,  S,  =  Et  11, ,  the  energy-per-bit  to  MAI  ratio  becomes 


Ei 

N0 


El 

Sj  ’ 7  =  1, 2, 3, ...,  A 

Wt 


(3) 


where  Ei  is  the  energy-per-bit.  No/ 2  is  the  two-sided  noise  power  spectral  density  due  to  MAI  in  Watts/Hertz,  Si  is  the 
power  of  the  node-of-interest  in  Watts,  R,  is  the  transmitted  bit  rate  in  bits  per  second,  Sj  is  the  power  of  the  interfering 
node  in  Watts,  and  Wt  is  the  total  bandwidth  in  Hertz.9  R,  is  taken  to  be  the  total  bit  rate  used  for  source  and  channel 
coding.  Assuming  N  users,  Ri  can  be  expressed  as 


R.j  = 


Rs 

Rr 


-;  *  =  1)  2, 3, ...,  N 


(4) 


where  RSj  is  the  source  coding  rate  for  node  i  and  is  the  channel  coding  rate  for  node  i.  Since  Rst  has  units  of  bits 
per  second  and  RC  i  is  a  dimensionless  number,  R,  will  be  measured  in  bits  per  second.6 

In  the  first  part  of  this  work,  random  spreading  codes  are  assumed  and  no  specific  receiver  design  is  chosen.  In  the 
second  part  of  this  work,  spreading  codes  are  chosen  from  a  set  of  minimum  Total  Squared  Correlation  (TSC)  optimal 
Karystinos-Pados  spreading  codes1819  that  are  available  for  all  lengths  L  and  number  of  signatures  K  except  K  =  L  = 
4n  +  1,  n  =  1, 2, . . ..  A  flow-chart  showing  simple  algorithms  based  on  Hadamard  matrix  transformations  for  the  design 
of  optimum  binary  signature  sets  that  achieve  the  TSC  bound  for  both  underloaded  and  overloaded  systems  is  given  in.18 
After  carrier  demodulation,  chip-matched  filtering,  and  chip-rate  sampling,  auxiliary  vector  (AV)  filtering  provides  MAI- 
suppressing  despreading.  Under  small  sample  support  adaptation,  AV  filter  short-data-record  estimators  exhibit  superior 
bit  error  rate  (BER)  performance  in  comparison  to  least  mean  squares  (LMS),  recursive  least  squares  (RLS),  sample  matrix 
inversion  (SMI),  diagonally-loaded  SMI,  or  multistage  nested  Wiener  filter  implementations.20  21 

In  our  visual  sensor  network,  we  assume  the  nodes  are  equipped  with  video  cameras  that  monitor  various  fields.  We 
assume  each  node  has  the  computational  power  necessary  for  video  compression.  The  video  captured  by  the  cameras  is 
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compressed  using  the  H.264/AVC  video  coding  standard.  Typically  when  a  number  of  video  cameras  are  deployed  to 
survey  a  large  area,  some  of  the  nodes  will  be  imaging  stationary  fields  whereas  other  nodes  will  be  imaging  scenes  with 
more  motion.  Video  sequences  with  less  motion  can  be  source  encoded  at  a  lower  bit  rate  while  still  yielding  good  picture 
quality.  Furthermore,  it  is  possible  for  the  centralized  control  unit  to  request  that  the  video  of  specific  source  nodes  be 
transmitted  at  a  lower  picture  quality  and  bit  rate,  if  it  is  deemed  less  important.  We  require  each  node  to  transmit  at  the 
same  chip  rate.  So,  if  a  node  requires  a  lower  source  coding  rate,  a  larger  percentage  of  the  total  bit  rate  is  available  for 
channel  coding.  Using  stronger  channel  coding  can  correct  more  bit  errors  in  a  link  layer  packet.  Thus,  that  particular  node 
can  afford  a  higher  bit  error  rate,  which  means  that  it  can  use  a  lower  transmission  power.  This  has  the  dual  benefit  that  it 
both  conserves  energy  for  the  node  and  reduces  the  interference  caused  to  the  other  nodes. 

In  this  work,  we  use  Rate  Compatible  Punctured  Convolutional  (RCPC)  codes  for  channel  coding.  Punctured  con¬ 
volutional  codes  were  originally  developed  to  simplify  Viterbi  decoding  for  rate  k/n  with  two  branches  arriving  at  each 
node  instead  of  2fc  branches.  Puncturing  is  the  process  of  deleting  bits  from  the  output  sequence  in  a  predefined  manner 
so  that  fewer  bits  are  transmitted  than  in  the  original  code.  The  idea  of  puncturing  was  extended  to  include  the  concept 
of  rate  compatibility.  Rate  compatibility  requires  that  a  higher-rate  code  be  a  subset  of  a  lower-rate  code,  or  that  a  lower- 
protection  code  be  embedded  into  a  higher-protection  code.  This  is  accomplished  by  puncturing  a  “mother”  code  of  rate 
1/n  to  achieve  higher  rates.  One  major  benefit  of  these  RCPC  codes  with  the  same  mother  code  is  that  they  all  can  be 
decoded  by  the  same  Viterbi  decoder.22 

Using  RCPC  codes  allows  us  to  utilize  Viterbi’s  upper  bounds  on  the  bit  error  probability,  Pb,  given  by 

1  X 

Ph  <  -p  22  CdPd  (5) 

d=dfree 

where  P  is  the  period  of  the  code,  d f  ree  is  the  free  distance  of  the  code,  cd  is  the  information  error  weight,  and  Pd  is  the 
probability  that  the  wrong  path  at  distance  d  is  selected.22  An  AWGN  channel  with  binary  phase-shift  keying  (BPSK) 
modulation  has  a  Pd  given  by 

where  Rc  is  the  channel  coding  rate  and  Eb/N0  is  the  energy -per-bit  normalized  to  the  single-sided  noise  spectral  density 
measured  in  Watts/Hertz. 

A  centralized  control  unit  at  the  network  layer  determines  how  network  resources  should  be  allocated  amongst  the 
nodes.  It  can  request  changes  in  transmission  parameters,  such  as  the  source  coding  rates,  channel  coding  rates,  and 
transmission  power  levels.  There  are  two  criteria  we  will  utilize  to  optimally  allocate  the  network  resources  to  each  node 
in  the  network.  The  constraint  for  both  criteria  is  that  the  chip  rate  be  the  same  for  all  nodes.  The  first  criterion  we  will 
employ  can  be  formally  stated  as  follows:  Given  an  overall  chip  rate,  Rbudget ,  optimally  allocate  a  source  coding  rate,  Rs, 
a  channel  coding  rate,  Rc,  and  a  power  level,  S,  to  all  nodes  such  that  the  overall  end-to-end  distortion  Dave  over  all  nodes 
is  minimized 

mill  Dave  Subject  tO  Rcbip  Pbudget  G) 

where  Rchi.p  is  the  chip  rate  for  each  node  and  Dave  is  the  resulting  expected  distortion  averaged  over  all  nodes  in  the 
system  which  is  due  to  both  source  coding  errors  and  channel  errors.  Assuming  Anodes,  Dave  is  expressed  by 

1  N 

Dave  =  22  ^  [Ps+c,i]  (8) 

i-1 

where  E  [Ds+C}i]  is  the  expected  distortion  for  user  i.  The  distortion  due  to  source  coding  is  a  result  of  the  quantization 
process  and  is  deterministic.  However,  the  distortion  due  to  channel  errors  is  stochastic.  Thus,  the  total  distortion  for  each 
user  is  also  stochastic,  and  we  use  its  expected  value. 

The  second  criterion  we  will  use  to  allocate  resources  to  the  nodes  in  the  network  minimizes  the  maximum  distortion 

min{max  Ds+Cj}  subject  to  Rchip  —  Rbudget  (9) 
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where  Rch.iP  is  the  chip  rate  for  each  node.  Our  constraint  is  again  that  the  chip  rate  be  the  same  for  all  DS-CDMA 
nodes.  This  formulation  assumes  that  the  videos  from  all  sensors  are  equally  important,  but  allows  sensors,  which  image 
low-motion  scenes  to  use  a  lower  source  coding  rate.  This  criterion  guarantees  fairness  among  all  sensors,  since  we  are 
minimizing  the  worst  distortion  among  all  sensors.  The  problem  is  a  discrete  optimization  problem,  that  is,  Rs,i,  Rc,p  and 
Si  can  only  take  values  from  discrete  sets  Rs,  Rc,  and  S,  respectively,  i.e.,  IiH  l  £  Rs,  Rc,i  £  Rc  Si  £  S.16  6 

Since  it  would  be  prohibitively  complex  to  experimentally  obtain  the  expected  distortion  for  each  node  for  all  possible 
combinations  of  source  coding  rates,  channel  coding  rates,  and  power  levels,  we  instead  have  chosen  to  relax  the  optimality 
of  the  algorithm  and  utilize  Universal  Rate-Distortion  Characteristics  (URDCs).  These  characteristics  show  the  expected 
distortion  as  a  function  of  the  bit  error  probabilities,  /),,  after  channel  coding.  In  the  first  part  of  this  work,  I\  is  calculated 
using  equations  (3)-(6)  for  the  set  of  source  coding  rates,  channel  coding  rates  and  power  levels.  In  equation  (4),  /?,  is 
found  for  the  sets  of  Rs  and  Rc.  These  values  are  substituted  into  equation  (3)  for  the  set  of  S.  The  results  for  P,  /No  are 
plugged  in  for  Ei,/Nfl  in  equation  (6)  to  obtain  Pd .  Finally,  the  resulting  values  for  Pd  are  substituted  into  equation  (5)  to 
obtain  the  upper  bound  on  the  /),.  This  upper  bound  on  the  Pi,  acts  as  a  reference  for  the  performance  of  channel  coding 
over  the  specified  channel  with  the  given  parameters.  In  the  second  part  of  this  work,  P, is  found  through  simulating  video 
transmission  in  a  Rayleigh  fading  environment  with  interferers  transmitting  video  data. 

We  calculate  a  RTP  packet  loss  rate  (PLR)  from  a  certain  BER,  drop  packets  from  the  H.264  bitstream  according  to 
the  RTP  PLR,  and  pass  the  corrupted  H.264  bitstream  to  the  H.264  decoder  to  calculate  the  distortion  of  the  uncompressed 
video.  We  assume  that  we  know  when  a  packet  has  an  error.  And,  we  manually  drop  packets  with  any  errors  from  the 
H.264  encoded  video  stream,  in  accordance  with  the  PLRpxp  calculated  from  the  BER.  We  then  calculate  the  distortion 
of  this  “corrupted”  video  stream.  This  creates  the  relation  between  each  BER  used  in  the  URDCs  and  the  distortion  of  a 
packet-based  video  stream  with  packet  errors. 

We  assume  the  following  model  for  the  URDC  for  each  user  i 


D 


s+c,i 


log 


10 


(10) 


where  a  and  b  are  such  that  the  square  of  the  approximation  error  is  minimized.23  24  Thus,  instead  of  calculating  the 
URDCs  based  on  experimental  results  for  every  possible  /),,  we  instead  experimentally  calculate  the  expected  distortion 
for  a  few  packet  loss  rates  associated  with  specific  bit  error  rates,  f),’s.  We  then  use  the  model,  given  in  equation  (10),  to 
approximate  the  distortion  for  other  bit  error  rates.  The  distortion  for  a  particular  user  i,  /9S+Cl,  given  a  particular  source 
coding  rate,  Rs,i,  is  a  function  of  the  bit  error  rate.  Therefore,  URDCs  will  give  a  family  of  Ds+c,i  versus  1  / /),  curves 
given  a  set  of  source  coding  rates  for  each  type  of  node.23  24 

We  perform  the  optimization  procedure  using  the  proposed  model  for  URDCs.  The  data  points  used  to  obtain  the 
parameters  a  and  b  are  obtained  by  corrupting  the  video  stream  with  packet  errors  based  on  a  calculated  /),,  decoding  the 
corrupted  video  bit  stream  with  the  H.264/AVC  codec,  calculating  the  distortion,  repeating  this  experiment  300  times  and 
then  taking  the  average  distortion.  We  assume  that  there  are  two  possible  motion  levels  viewed  by  the  sensor  nodes,  low 
motion  and  high  motion.  The  “Akiyo”  sequence  is  used  to  represent  a  low-motion  node,  and  the  “Foreman”  sequence  is 
used  to  represent  a  high-motion  node.  It  is  necessary  to  have  two  sets  of  URDC  curves,  one  for  each  level  of  motion.  The 
characteristics  were  obtained  for  both  video  sequences  at  a  frame  rate  of  15  f/s. 

We  use  BPSK  modulation  and  RCPC  codes  with  mother  code  rate  1/4  for  channel  coding.22  We  first  assume  random 
spreading  codes  with  the  same  processing  gain  for  all  nodes,  so  our  constraint  that  the  chip  rate  be  the  same  for  all 
DS-CDMA  nodes  translates  into  a  constraint  on  the  transmitted  bit  rate  given  in  (4).  We  examine  various  target  chip 
rate  constraints  at  192000  c/s,  144000  c/s,  96000  c/s,  and  48000  c/s.  The  set  of  admissible  source  coding  rates  and 
corresponding  channel  coding  rates  for  the  various  chip  rates  are 


RchiP  =  192000 c/s  - 

■*  RS,RC 

£ 

{(64 kbps,  1/3),  ( 96kbps ,  1/2),  (128 kbps,  2/3)} 

(11) 

Rchip  =  144000c/ s  - 

Rs ,  R, 

:  £ 

{(4,8kbps,  1/3),  (64 kbps,  1/2),  (96 kbps,  2/3)} 

(12) 

Rchip  =  96000c/ s  - 

■*  RS,RC 

£ 

{(32 kbps,  1/3),  (48 kbps,  1/2),  (64 kbps,  2/3)} 

(13) 

Rchip  =  48000c/ s  - 

4  RS,RC 

£ 

{(16kbps,  1/3),  (24kbps,  1/2),  (32kbps,  2/3)} 

(14) 
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The  power  levels  in  Watts  were  chosen  from  S  €  {5, 10, 15}.  The  total  bandwidth,  Wt,  was  set  to  20MHz. 

In  Tables  1  -4,  we  show  how  the  network  resources  should  be  assigned  for  various  distributions  of  the  two  types  of  nodes 
for  different  target  chip  rates.  The  low-motion  nodes’  source  coding  rate  in  bits  per  second,  channel  coding  rate,  and  power 
level  in  Watts  are  represented  by  Rs i,  Rc\ ,  and  Si,  respectively,  and  the  high-motion  nodes’  parameters  are  represented  by 
Rs 2,  Rc 2,  and  S2-The  number  of  low-motion  nodes  is  given  under  column,  “Low”,  and  the  number  of  high-motion  nodes  is 
given  under  column, “High”.  “MAD”  corresponds  to  the  the  method  of  Minimizing  the  Average  end-to-end  Distortion  over 
all  users,  and  “MMD”  corresponds  to  the  technique  of  Minimizing  the  Maximum  Distortion.  In  Tables  1-4,  the  distribution 
of  the  two  types  of  nodes  is  varied  while  the  total  number  of  nodes  is  kept  constant.  Tables  1  and  2  use  the  MAD  criterion 
while  Tables  3  and  4  utilize  the  MMD  criterion.  There  are  equal  numbers  of  the  nodes  viewing  low-motion  scenes  and 
nodes  viewing  high-motion  scenes.  We  give  the  resulting  average  end-to-end  peak  signal-to-noise  ratio,  PSNR,  in  dB  for 
the  entire  network  as  a  measure  of  performance  for  the  MAD  experiments.  We  also  use  the  minimum  PSNR  as  a  measure 
of  performance  for  the  MMD  experiments.  The  PSNR  is  calculated  from  the  expected  distortion 

'--(S3) 

where  PSNR  is  the  peak  signal-to-noise  ratio  and  E  {Ds+C}  is  the  expected  distortion  due  to  source  and  channel  coding 


Table  1.  MAD  with  Equal  Distributions  of  Node  Types:  Target  chip  rate  =  144000  c/s 


Low 

{ReuRc^Si.) 

D  s+C,l 

High 

( Rs2 ,  Rc2,  S2) 

Ds+c,  2 

Dave 

PSNRave 

10 

(96k,  2/3, 15) 

1.8 

10 

(96/0,2/3,15) 

12.1 

6.9 

39.7  dB 

30 

(96k,  2/3, 10) 

4.7 

30 

(96/o,2/3,15) 

16.0 

10.4 

38.0  dB 

50 

(48A:,l/3,5) 

11.9 

50 

(96/o,2/3,10) 

20.6 

16.3 

36.0  dB 

70 

(48/o,l/3,5) 

20.0 

70 

(96/o,2/3,10) 

32.8 

26.4 

33.9  dB 

90 

(48/o,l/3,10) 

24.5 

90 

(48/o,l/3,15) 

68.3 

46.4 

31.5  dB 

Table  2.  MAD  with  Equal  Distributions  of  Node  Types:  Target  chip  rate  =  96000  c/s 


Low 

(Rsi,Rd,  Si) 

D  S-\-C,  1 

High 

( Rs2 ,  Rc2,  S2) 

Ds+c, 2 

Dave 

PSNRave 

10 

(64/o,2/3,15) 

3.0 

10 

(64/0,2/3,15) 

22.4 

12.7 

37.1  dB 

30 

(48/o,l/2,5) 

7.9 

30 

(64/o,2/3,15) 

23.5 

15.7 

36.2  dB 

50 

(48/o,l/2,5) 

9.7 

50 

(64/o,2/3,10) 

35.1 

22.4 

34.6  dB 

70 

(48/0,1/2,10) 

11.7 

70 

(48/0,1/2,15) 

49.7 

30.7 

33.3  dB 

90 

(32/o,l/3,10) 

19.8 

90 

(48/0,1/2,15) 

58.9 

39.3 

32.2  dB 

Table  3.  MMD  with  Equal  Distributions  of  Node  Types:  Target  chip  rate  =  144000  c/s 


Low 

(Rsi,Rci,  S 1) 

Ds+C,  1 

High 

(Ps2i  Pc2 ?  ^2) 

Ds+c, 2 

Dave 

PSNRave 

10 

(72/o,l/2,15) 

1.8 

10 

(96/0,2/3,15) 

12.1 

6.9 

39.7  dB 

30 

(48/o,l/3,5) 

9.6 

30 

(96/o,2/3,15) 

14.5 

12.1 

37.3  dB 

50 

(48/o,l/3,5) 

17.9 

50 

(96/o,2/3,15) 

18.9 

18.4 

35.5  dB 

70 

(48/o,l/3,5) 

20.0 

70 

(96/o,2/3,10) 

32.8 

26.4 

33.9  dB 

90 

(48/o,l/3,10) 

24.5 

90 

(48/o,l/3,15) 

68.3 

46.4 

31.5  dB 

We  see  that  in  most  MAD  cases,  high-motion  nodes  are  assigned  a  higher  source  coding  rate  than  the  low-motion 
nodes.  This  is  because  the  drop  in  the  end-to-end  distortion  when  increasing  the  source  coding  rate  for  a  high-motion  video 
sequence  is  more  significant  than  the  effect  of  employing  stronger  channel  coding.  However,  the  distortions  for  the  low- 
motion  video  sequence  remain  relatively  low  even  when  the  source  coding  rate  is  decreased,  so  it  can  afford  to  transmit  at 
a  lower  source  coding  rate  in  some  cases.  Since  low-motion  video  sequences  are  more  robust  to  errors,  low-motion  nodes 
are  also  assigned  a  lower  power  than  the  high-motion  nodes  in  most  cases  even  when  they  both  have  the  same  source  and 
channel  coding  rates.  The  times  when  all  of  the  source  coding  rates,  channel  coding  rates,  and  power  levels  are  the  same  is 
usually  when  the  network  resources  are  being  strained  with  a  large  amount  of  nodes  transmitting  at  high  rates  with  limited 
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Table  4.  MMD  with  Equal  Distributions  of  Node  Types:  Target  chip  rate  =  96000  c/s 


Low 

(Rsi,Rci,  Si) 

Ds-\-c,  i 

High 

( Rs2 ,  Rc2,  S2) 

Ds+c,  2 

Dave 

PSNRave 

10 

(487c,  1/2, 15) 

3.0 

10 

(48/0,1/2,15) 

22.4 

12.7 

37.1  dB 

30 

(48A:,l/2,5) 

7.9 

30 

(64/o,2/3,15) 

23.5 

15.7 

36.2  dB 

50 

(48/o,l/2,5) 

14.6 

50 

(64/o,2/3,15) 

32.2 

23.4 

34.4c/il 

70 

(32/o,l/3,5) 

25.7 

70 

(64/o,2/3,15) 

42.2 

34.0 

32.8  dB 

90 

(32/o,l/3,5) 

51.5 

90 

(48/0,1/2,15) 

50.5 

51.0 

31.1  dB 

bandwidth.  It  is  forced  to  assign  the  lower  source  coding  rates  and  power  levels  to  both  types  of  nodes  in  those  cases.  We 
note  that  overloading  the  network  results  in  a  drastic  drop  in  the  system  performance  to  the  point  of  obtaining  unrealistic 
distortions  that  would  result  in  no  viewable  video.  We  see  that  the  system  starts  to  deteriorate  when  there  are  more  than 
150  total  nodes. 

In  Figure  1,  we  show  graphs  of  the  resulting  average  PSNR  or  the  minimum  PSNR  of  the  overall  network  for  various 
target  chip  rates  using  the  two  criteria.  We  plot  the  average  PSNR  against  the  number  of  nodes  in  the  network  for  all  the 
target  chip  rates  in  Figure  1  (a).  There  are  equal  numbers  of  the  low-motion  nodes  and  the  high-motion  nodes.  In  Figure  1 
(b),  we  plot  the  minimum  PSNR  versus  the  number  of  nodes  in  the  network  for  different  target  chip  rates.  The  minimum 
PSNR  corresponds  to  the  maximum  distortion  which  is  minimized  according  to  the  MMD  criteria.  Even  though  initially 
the  higher  target  chip  rates  result  in  the  higher  PSNR,  we  see  that  in  both  figures  the  downward  slope  of  the  curve  for 
higher  target  chip  rates  is  also  steeper.  This  translates  into  the  network  being  overloaded  sooner  with  higher  chip  rates. 

Figure  1.  (a)  Average  PSNRs  versus  Number  of  Nodes  for  different  Target  Chip  Rates  [c/s]  using  MAD  (b)  Minimum  PSNRs  versus 
Number  of  Nodes  for  different  Target  Chip  Rates  [c/s]  using  MMD 


In  Figures  2  (a)  and  fc),  we  plot  the  average  PSNR  using  MAD  versus  the  total  bandwidth,  Wt,  for  the  target  chip  rates 
of  48000  c/s  and  192000  c/s  respectively.  We  plot  the  minimum  PSNR  using  MMD  versus  Wt  in  Figures  2  (b)  and  (d). 
Since  these  figures  show  curves  for  different  numbers  of  total  nodes,  they  can  be  used  to  determine  what  total  bandwidth 
is  required  to  accommodate  a  desired  number  of  nodes  and  a  desired  average  end-to-end  video  quality  level.  We  see  the 
drastic  drop  in  performance  at  low  Wt.  We  also  observe  the  relationship  between  the  number  of  nodes  and  the  required  Wt 
to  handle  that  number  of  nodes  for  a  certain  target  chip  rate. 
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Figure  2.  (a)  Average  PSNRs  using  MAD  versus  Total  Bandwidth  for  Target  Chip  Rate  =  48000  c/s  (b)  Minimum  PSNRs  using  MMD 
versus  Total  Bandwidth  for  Target  Chip  Rate  =  48000  c/s  (c)  Average  PSNRs  using  MAD  versus  Total  Bandwidth  for  Target  Chip  Rate 
=  192000  c/s  (d)  Minimum  PSNRs  using  MMD  versus  Total  Bandwidth  for  Target  Chip  Rate  =  192000  c/s 


For  comparison  sake.  Figure  3  show  similar  graphs  with  various  target  chip  rates  for  the  50-node  case.  As  expected  at 
the  higher  target  chip  rates,  it  takes  more  bandwidth  to  reach  the  maximum  level  of  PSNR.  We  see  how  the  improvement 
in  average  PSNR  between  target  chip  rates  48000  c/s  and  96000  c/s  is  significantly  greater  than  the  improvement  between 
target  chip  rates  144000  c/s  and  192000  c/s,  even  though  there  is  the  same  48000  c/s  difference  in  chip  rate.  If  there  are 
bandwidth  restrictions  on  the  network,  one  can  use  these  figures  to  determine  the  target  chip  rate  that  will  result  in  the  best 
average  end-to-end  PSNR  overall  for  a  given  total  bandwidth. 


The  results  up  until  this  point  assumed  random  spreading  codes,  that  interference  was  seen  as  white  Gaussian  noise, 
and  no  specific  receiver  design.  Now  we  shall  consider  more  specific  parameters  such  as  Total  Squared  Correlation  (TSC) 
codes,  interleaving,  a  Rayleigh  fading  channel,  and  Auxiliary  Vector  (AV)  filtering  at  the  receiver  for  the  proposed  opti¬ 
mization  method  to  validate  the  theoretical  assumptions  made  in  the  first  part  of  this  work.  Instead  of  utilizing  the  Viterbi 
upper  bound  on  the  probability  of  error,  we  find  the  probabilities  of  error  through  simulating  actual  simultaneous  node 
transmissions.  We  transmit  multiple  nodes’  data,  divided  evenly  between  high-motion  nodes  and  low-motion  nodes.  We 
spread  the  data  with  TSC  codes,  encode  the  spread  data  with  RCPC  codes,  and  interleave  the  encoded  data.  Next,  we  send 
the  data  over  a  Rayleigh  fading  channel  with  3  multipaths.  At  the  receiver,  the  data  is  demodulated  and  despread  using 
the  AV  filter.  The  despread  data  is  channel  decoded  with  a  Viterbi  decoder  and  then  passed  through  a  deinterleaver.  After 
repeated  runs,  the  probability  of  error  is  determined.  In  Figure  4,  the  RSC-AWGN  method  refers  to  using  random  spreading 
codes,  interference  as  white  Gaussian  noise,  no  specific  receiver  and  hence,  utilizing  Viterbi  upper  bound  on  the  probability 
of  error.  The  TSC-AV  method  refers  to  simulating  transmission  while  incorporating  TSC  spreading  codes,  interleaving,  a 
Rayleigh  fading  channel,  and  AV  filtering  at  the  receiver.  Utilizing  the  TSC-AV  method  results  in  higher  average  PSNR 
values  for  very  low  and  very  high  numbers  of  nodes  for  MAD,  and  hence  better  average  end-to-end  video  quality  even 
when  using  a  more  realistic  Rayleigh  fading  channel.  The  MMD  curves  follow  each  other’s  behavior  closely.  TSC-AV 
uses  spreading  codes  with  an  interference  suppression  algorithm.  Since  the  RSC-AWGN  method  uses  random  codes  that 
have  no  interference  suppression  characteristics,  all  of  the  power  transmitted  by  other  nodes  arrives  as  interference  at  the 
node-of-interest,  so  we  see  its  performance  is  a  bit  less  at  some  points. 
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Figure  3.  (a)  Average  PSNRs  using  MAD  versus  Total  Bandwidth  for  50  Nodes  for  different  Target  Chip  Rates  [c/s]  (b)  Minimum 
PSNRs  using  MMD  versus  Total  Bandwidth  for  50  Nodes  for  different  Target  Chip  Rates  [c/s] 


Figure  4.  (a)  Comparing  RSC-AWGN  and  TSC-AV  methods  using  MAD  for  different  Target  Chip  Rates  [c/s]  (b)  Comparing  RSC- 
AWGN  and  TSC-AV  methods  using  MMD  for  different  Target  Chip  Rates  [c/s] 


3.  CONCLUSION 

In  this  paper,  we  present  a  cross-layer  optimization  algorithm  that  works  across  the  physical  layer,  the  data  link  layer, 
and  the  application  layer  in  a  wireless  visual  sensor  network.  This  algorithm  accounts  for  network  performances  all  the 
way  from  the  physical  layer  up  to  the  application  layer.  At  the  application  layer,  we  determined  the  source  coding  rate,  Rs, 
for  video  compression.  At  the  data  link  layer,  we  assign  the  channel  coding  rate,  Rc.  At  the  physical  layer,  we  select  the 
transmission  power  level,  S.  The  algorithm  shows  how  to  distribute  these  parameters  among  all  the  nodes  transmitting  in 
the  network.  To  create  a  realistic  DS-CDMA  visual  sensor  network,  different  levels  of  motion  are  assumed  to  be  imaged 
by  the  nodes.  By  utilizing  the  parametric  model  for  the  URDCs,  we  find  each  node’s  expected  distortion  for  only  the 
probabilities  of  error  calculated  for  a  small  number  of  source  coding  rates,  channel  coding  rates,  and  power  levels  and 
used  the  model  to  estimate  the  distortion  for  other  probabilities  of  error.  This  reduced  the  computational  complexity  of 
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the  solution  significantly.  We  present  the  combinations  of  {Rs,  Rc,  S}  for  each  node  that  result  in  the  minimal  average 
end-to-end  distortion  over  all  nodes  in  the  system  and  the  combinations  that  minimize  the  maximum  distortion.  We  also 
show  how  to  determine  the  minimum  total  bandwidth  needed  to  obtain  a  specific  level  of  quality  for  the  desired  number  of 
nodes  and  which  target  chip  rate  achieves  the  highest  average  PSNR  for  a  given  total  bandwidth. 
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